First I started with selecting variables that are relevant for the task 1. Those variables included examinee ID’s, four Emmersion English assessment scores on speaking, writing, reading, and listening, and two human rated scores on speaking and writing, and one combined score, and one summer level score which is used to suggest next level of English class. The final data consisted with 9 variables and 127 observations and there were no missing values in the data.
| type | n.students | Mean | SD | Min | Max | Range | Skewness |
|---|---|---|---|---|---|---|---|
| HRSPEAK | 127 | 4.24 | 1.23 | 1.23 | 6.60 | 5.37 | -0.25 |
| EMSPEAKING | 127 | 6.16 | 1.13 | 3.90 | 9.10 | 5.20 | 0.23 |
| HRWRITE | 127 | 4.21 | 1.13 | 1.19 | 6.70 | 5.51 | -0.19 |
| EMWRITING | 127 | 5.37 | 1.51 | 2.20 | 10.00 | 7.80 | 0.23 |
| EMREADING | 127 | 602.42 | 105.46 | 303.00 | 729.00 | 426.00 | -0.78 |
| EMLISTENING | 127 | 469.49 | 113.49 | 144.00 | 657.00 | 513.00 | -0.49 |
| CombinedPlacementTestBattery | 127 | 4.25 | 1.07 | 1.18 | 5.97 | 4.79 | -0.73 |
| SummerLEVEL | 127 | 4.19 | 1.01 | 1.00 | 6.00 | 5.00 | -0.60 |
Next, I looked at descriptive statistics of each score. As you can see in this table, the Emmersion reading and listening scores are on a very different scale compared to the rest of the scores. So I standardized all eight score variables in order to make these scores being on a same scale and to make them comparable.
One thing to note here is that standardizing doesn’t change any of the shape of the original distribution. These two plots show exactly what happens after standardizing – the distribution stays the same but only the scale on the x-axis changes.
| type | n.students | Mean | SD | Min | Max | Range | Skewness |
|---|---|---|---|---|---|---|---|
| HRSPEAK | 127 | 0 | 1 | -2.45 | 1.92 | 4.37 | -0.25 |
| EMSPEAKING | 127 | 0 | 1 | -2.00 | 2.61 | 4.61 | 0.23 |
| HRWRITE | 127 | 0 | 1 | -2.68 | 2.21 | 4.89 | -0.19 |
| EMWRITING | 127 | 0 | 1 | -2.10 | 3.06 | 5.16 | 0.23 |
| EMREADING | 127 | 0 | 1 | -2.84 | 1.20 | 4.04 | -0.78 |
| EMLISTENING | 127 | 0 | 1 | -2.87 | 1.65 | 4.52 | -0.49 |
| CombinedPlacementTestBattery | 127 | 0 | 1 | -2.87 | 1.59 | 4.46 | -0.73 |
| SummerLEVEL | 127 | 0 | 1 | -3.16 | 1.80 | 4.96 | -0.60 |
This table now shows the descriptive statistics of each standardized score, with mean \(0\) and standard deviation \(1\). You could see that the range of the scores are pretty similar and none of the scores are highly skewed since all the skewness values fall between -1 and 1.
Next I checked with the distributions of the human rated speaking score and the Emmersion speaking score. You could see that the Emmersion score curve is more peaked around zero meaning that more exminees are around the mean. The red area here shows that there are more examinees around higher range and lower range of the human rated speaking score. This may indicate that human raters are more likely to give more extreme scores.
Similar distribution shapes were found for writing scores.
| type | n.students | Mean | SD | Min | Max | Range | Skewness |
|---|---|---|---|---|---|---|---|
| CombinedPlacementTestBattery | 127 | 0 | 1.00 | -2.87 | 1.59 | 4.46 | -0.73 |
| EM.total.score | 127 | 0 | 0.83 | -2.29 | 2.10 | 4.39 | -0.33 |
In order to understand the relationship between the Emmersion speaking and writing scores and the total score, I wanted to compute Emmersion total score. First I examined the combined score provided in the dataset and found that this combined score is almost perfectly correlated with the average score of the human rated speaking and writing scores and Emmersion reading and listening scores. So next I averaged Emmersion speaking, writing, reading, and listening scores and created a Emmersion total score. Again, both combined score and Emmersion total score are standardized.
Here you could see that the Emmersion total score is more normally distributed and less skewed meaning that there are more examinees distributed around the mean and less extreme scores were reported. Combined score distribution is more skewed and has less examinees around the mean. This indicates that human raters seem to give more extreme scores, that are either too high or too low, compared to the Emmersion’s automated adaptive tests.
Next I ran Spearman’s rank order correlation test to evaluate the rank order of the two scores. In this test, the null hypothesis is that there is no monotonic association between the two scores in the population.
All tests I conducted below showed rejecting the null hypothesis at \(\alpha\) level \(0.05\) indicating that there are statistically significant monotonic relationship between the scores I tested. And here, setting the \(\alpha\) level at \(0.05\) means there is less than a \(5\%\) chance that the strength of the relationship I found happened by chance if the null hypothesis were true.
Spearman's rank correlation rho
data: dat1.t1.std$HRSPEAK and dat1.t1.std$EMSPEAKING
S = 97509, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.7143656
The rank order relationship between human rated and Emmersion speaking scores was about \(0.7\), which is fairly high correlation showing that Emmersion speaking score represents well with what human rated speaking score are representing.
Spearman's rank correlation rho
data: dat1.t1.std$HRSPEAK and dat1.t1.std$CombinedPlacementTestBattery
S = 81121, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.762371
The correlation between the human rated speaking score and combined score was fairly high as well.
Spearman's rank correlation rho
data: dat1.t1.std$EMSPEAKING and dat1.t1.std$EM.total.score
S = 62574, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.8167018
The relationship between the Emmersion speaking score and Emmersion total score was even stronger, which was about \(0.82\). This means that the higher an examinee ranked in Emmersion Speaking, the higher the examinee ranked in Emmersion total score, and vice versa.
Spearman's rank correlation rho
data: dat1.t1.std$HRWRITE and dat1.t1.std$EMWRITING
S = 131783, p-value = 1.646e-14
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.6139657
The rank order relationship between human rated writing score and Emmersion writing score was about \(0.6\), which is moderately high.
Spearman's rank correlation rho
data: dat1.t1.std$HRWRITE and dat1.t1.std$CombinedPlacementTestBattery
S = 107384, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.6854369
The correlation between the human rated writing score and combined score was moderately high as well.
Spearman's rank correlation rho
data: dat1.t1.std$EMWRITING and dat1.t1.std$EM.total.score
S = 57923, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.8303254
The relationship between the Emmersion writing score and Emmersion total score was very strong, which was about \(0.83\). Overall, the results indicated that the rank on each Emmersion speaking and writing score is more highly consistent with the rank on Emmersion total score.
Next I evaluated how examinees are distributed with each score and how the distribution of the examinees changes when different scores are used. I visualized this into plots using paired data.
In this plot, two different scores are on the x-axis, which are human rated speaking score on the left and Emmersion speaking score on the right. And you could see standardized scores are on the y-axis.
The human rated scores are more unevenly distributed as you could see there are many thick points where examinees are clustered. This indicates that it is difficult to discriminate examinees in terms of their English speaking ability when human rated scores are used. On the contrary, you could see that the Emmersion speaking scores are evenly distributed across the score range indicating that the Emmersion score is much more useful in discriminating examinees.
Very similar patterns were observed with the writing scores.
Lastly I also examined the difference between the combined scores and Emmersion total score. The Emmersion total score showed less examinees on extreme scores and more examinees around the average compared to the combined score. This indicates that the Emmersion total score follows more close with the normal distribution and better functioning in assessing English abilities of examinees.
In summary, the following evidence could be used to support the use of Emmersion Speaking and Writing assessments in place of human rated assessments:
Emmersion speaking and writing scores and total score are more normally distributed representing better with the overall English ability of examinees.
The relationship between the Emmersion score and human rated score were fairly strong indicating the Emmersion scores are doing the same job in terms of discriminating examinees for their overall English ability.
Much stronger relationship was found between the Emmersion speaking and writing score and Emmersion total score compared to the relationship between human rated speaking and writing score and combined score. This result implies that the rank order of the examinees aligns very well with the total score when Emmersion speaking and writing assessments were used.
Emmersion speaking and writing assessments are better in discriminating examinees compared to the human rated assessments. This finding is especially important regarding the fact that these scores are used to inform examinees on which level of the class they will be assigned.
| type | n.students | Mean | SD | Min | Max | Range | Skewness |
|---|---|---|---|---|---|---|---|
| FLUENCY1 | 127 | 0 | 1 | -3.36 | 1.75 | 5.11 | -0.97 |
| FLUENCY2 | 127 | 0 | 1 | -2.60 | 2.70 | 5.30 | -0.09 |
| PRONUN1 | 127 | 0 | 1 | -2.43 | 2.32 | 4.75 | -0.30 |
| PRONUN2 | 127 | 0 | 1 | -1.47 | 3.62 | 5.09 | 1.53 |
| VOCAB | 127 | 0 | 1 | -0.94 | 5.03 | 5.97 | 2.92 |
| SENTENCEMASTERY | 127 | 0 | 1 | -2.90 | 1.43 | 4.33 | -0.81 |
| EMSPEAKING | 127 | 0 | 1 | -2.00 | 2.61 | 4.61 | 0.23 |
| HRSPEAK | 127 | 0 | 1 | -2.45 | 1.92 | 4.37 | -0.25 |
First I checked the descriptive statistics of each score. Again, all scores are standardized with mean \(0\) and standard deviation \(1\). Some variables such as PRONOUN2 and VOCAB are highly skewed and it is shown in this boxplot as well.
FLUENCY1 FLUENCY2 PRONUN1 PRONUN2 VOCAB SENTENCEMASTERY EMSPEAKING
FLUENCY1
FLUENCY2 0.89***
PRONUN1 0.22* 0.28**
PRONUN2 0.39*** 0.54*** 0.55***
VOCAB 0.30*** 0.42*** 0.38*** 0.90***
SENTENCEMASTERY 0.47*** 0.62*** 0.44*** 0.73*** 0.49***
EMSPEAKING 0.50*** 0.64*** 0.43*** 0.90*** 0.77*** 0.87***
HRSPEAK 0.52*** 0.61*** 0.31*** 0.63*** 0.49*** 0.71*** 0.72***
The correlation table shows that PRONOUN2, VOCAB, and SENTENCEMASTERY scores are fairly highly correlated with Emmersion Speaking score and only SENTENCEMASTERY score was fairly highly correlated with the human rated speaking score.
In order to investigate which score variable contributes the most to examinees’ English speaking ability, I ran linear regression models using 6 additional speaking scores as dependent variables and human rated speaking score and Emmersion speaking score independent variables.
Call:
lm(formula = HRSPEAK ~ FLUENCY1 + FLUENCY2 + PRONUN1 + PRONUN2 +
VOCAB + SENTENCEMASTERY, data = dat1.t2.std)
Residuals:
Min 1Q Median 3Q Max
-1.62113 -0.52201 -0.05416 0.46225 1.71278
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -7.878e-16 5.925e-02 0.000 1.000000
FLUENCY1 1.677e-01 1.352e-01 1.240 0.217233
FLUENCY2 7.477e-02 1.531e-01 0.488 0.626218
PRONUN1 -8.284e-02 7.538e-02 -1.099 0.273996
PRONUN2 2.601e-01 2.363e-01 1.101 0.273281
VOCAB -4.519e-03 1.757e-01 -0.026 0.979526
SENTENCEMASTERY 4.314e-01 1.145e-01 3.768 0.000257 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.6678 on 120 degrees of freedom
Multiple R-squared: 0.5753, Adjusted R-squared: 0.5541
F-statistic: 27.09 on 6 and 120 DF, p-value: < 2.2e-16
SENTENCEMASTERY score was the only statistically significant variable in in predicting human rated speaking score. The SENTENCEMASTERY score was positively associated with the human rated speaking score, after controlling all other variables. About \(57.5\%\) of the variation in human rated speaking score was explained by the model.
Call:
lm(formula = EMSPEAKING ~ FLUENCY1 + FLUENCY2 + PRONUN1 + PRONUN2 +
VOCAB + SENTENCEMASTERY, data = dat1.t2.std)
Residuals:
Min 1Q Median 3Q Max
-0.88738 -0.17731 -0.02349 0.13919 0.84548
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -7.308e-16 2.447e-02 0.000 1.000000
FLUENCY1 7.050e-02 5.582e-02 1.263 0.208989
FLUENCY2 5.545e-04 6.322e-02 0.009 0.993017
PRONUN1 -7.969e-02 3.112e-02 -2.561 0.011690 *
PRONUN2 3.359e-01 9.756e-02 3.443 0.000792 ***
VOCAB 2.196e-01 7.255e-02 3.027 0.003026 **
SENTENCEMASTERY 5.215e-01 4.727e-02 11.032 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.2757 on 120 degrees of freedom
Multiple R-squared: 0.9276, Adjusted R-squared: 0.924
F-statistic: 256.2 on 6 and 120 DF, p-value: < 2.2e-16
PRONUN1, PRONUN2, VOCAB, and SENTENCEMASTERY scores were statistically significant in predicting Emmersion speaking score. PRONUN1 score was negatively associated with the Emmersion speaking score, after controlling all other variables. About \(92.8\%\) of the variation in Emmersion speaking score was explained by the model.
I further ran stepwise regression using backward selection method to find the subset of score variables resulting in the best performing model. Backward selection means that I start with all predictors in the model, which I call the full model, and iteratively removes the least contributing predictors, and stops when I have a model where all predictors are statistically significant.
Single term deletions
Model:
EMSPEAKING ~ FLUENCY1 + FLUENCY2 + PRONUN1 + PRONUN2 + VOCAB +
SENTENCEMASTERY
Df Sum of Sq RSS AIC F value Pr(>F)
<none> 9.1222 -320.45
FLUENCY1 1 0.1213 9.2435 -320.77 1.5955 0.2089893
FLUENCY2 1 0.0000 9.1222 -322.45 0.0001 0.9930173
PRONUN1 1 0.4984 9.6206 -315.70 6.5566 0.0116897 *
PRONUN2 1 0.9014 10.0236 -310.48 11.8572 0.0007917 ***
VOCAB 1 0.6964 9.8186 -313.11 9.1614 0.0030255 **
SENTENCEMASTERY 1 9.2519 18.3741 -233.52 121.7058 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
On the first run, FLUENCY2 was the least significant predictor in the model so it was deleted in the next model.
Single term deletions
Model:
EMSPEAKING ~ FLUENCY1 + PRONUN1 + PRONUN2 + VOCAB + SENTENCEMASTERY
Df Sum of Sq RSS AIC F value Pr(>F)
<none> 9.1222 -322.45
FLUENCY1 1 0.4871 9.6093 -317.84 6.4615 0.0122868 *
PRONUN1 1 0.5015 9.6237 -317.66 6.6517 0.0111035 *
PRONUN2 1 0.9074 10.0296 -312.41 12.0365 0.0007235 ***
VOCAB 1 0.6964 9.8187 -315.11 9.2379 0.0029058 **
SENTENCEMASTERY 1 9.8359 18.9581 -231.55 130.4668 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
After eliminating FLUENCY2 variable in the model, now all predictors are significantly contributing in predicting Emmersion speaking score.
Call:
lm(formula = EMSPEAKING ~ FLUENCY1 + PRONUN1 + PRONUN2 + VOCAB +
SENTENCEMASTERY, data = dat1.t2.std)
Residuals:
Min 1Q Median 3Q Max
-0.88734 -0.17742 -0.02338 0.13924 0.84536
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -7.308e-16 2.436e-02 0.000 1.000000
FLUENCY1 7.093e-02 2.790e-02 2.542 0.012287 *
PRONUN1 -7.971e-02 3.091e-02 -2.579 0.011103 *
PRONUN2 3.360e-01 9.685e-02 3.469 0.000724 ***
VOCAB 2.196e-01 7.225e-02 3.039 0.002906 **
SENTENCEMASTERY 5.216e-01 4.567e-02 11.422 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.2746 on 121 degrees of freedom
Multiple R-squared: 0.9276, Adjusted R-squared: 0.9246
F-statistic: 310.1 on 5 and 121 DF, p-value: < 2.2e-16
The \(R^2\) of the final model is \(92.8\%\) and the value is same as the \(R^2\) of the full model. This indicates that the FLUENCY2 score is not significantly contributing in predicting Emmersion speaking score. The finding is consistent that SENTENCEMASTERY score is the most contributing variable in predicting Emmersion speaking score.
Single term deletions
Model:
HRSPEAK ~ FLUENCY1 + FLUENCY2 + PRONUN1 + PRONUN2 + VOCAB + SENTENCEMASTERY
Df Sum of Sq RSS AIC F value Pr(>F)
<none> 53.510 -95.769
FLUENCY1 1 0.6861 54.196 -96.151 1.5387 0.2172332
FLUENCY2 1 0.1063 53.616 -97.517 0.2385 0.6262182
PRONUN1 1 0.5385 54.048 -96.497 1.2077 0.2739960
PRONUN2 1 0.5401 54.050 -96.493 1.2113 0.2732812
VOCAB 1 0.0003 53.510 -97.768 0.0007 0.9795260
SENTENCEMASTERY 1 6.3300 59.840 -83.569 14.1956 0.0002568 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
I ran the same stepwise regression, now with the human rated speaking score as an outcome variable. In the full model, only the SENTENCEMASTERY score is statistically significant and VOCAB is the least contributing predictor. So in the next model, I removed VOCAB score.
Single term deletions
Model:
HRSPEAK ~ FLUENCY1 + FLUENCY2 + PRONUN1 + PRONUN2 + SENTENCEMASTERY
Df Sum of Sq RSS AIC F value Pr(>F)
<none> 53.510 -97.768
FLUENCY1 1 0.6858 54.196 -98.151 1.5508 0.215417
FLUENCY2 1 0.1063 53.616 -99.516 0.2404 0.624820
PRONUN1 1 0.5897 54.100 -98.376 1.3335 0.250467
PRONUN2 1 3.1340 56.644 -92.540 7.0867 0.008821 **
SENTENCEMASTERY 1 9.1386 62.649 -79.744 20.6647 1.307e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In this model the least contributing predictor is FLUENCY2 score so I removed this variable in the next model.
Single term deletions
Model:
HRSPEAK ~ FLUENCY1 + PRONUN1 + PRONUN2 + SENTENCEMASTERY
Df Sum of Sq RSS AIC F value Pr(>F)
<none> 53.616 -99.516
FLUENCY1 1 4.9013 58.518 -90.407 11.1526 0.001114 **
PRONUN1 1 0.6341 54.251 -100.023 1.4429 0.231991
PRONUN2 1 3.5104 57.127 -93.462 7.9876 0.005505 **
SENTENCEMASTERY 1 10.6053 64.222 -78.594 24.1315 2.827e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Now PRONUN1 is the least contributing predictor.
Single term deletions
Model:
HRSPEAK ~ FLUENCY1 + PRONUN2 + SENTENCEMASTERY
Df Sum of Sq RSS AIC F value Pr(>F)
<none> 54.251 -100.023
FLUENCY1 1 4.9678 59.218 -90.895 11.2632 0.001052 **
PRONUN2 1 2.8900 57.141 -95.432 6.5523 0.011686 *
SENTENCEMASTERY 1 10.2794 64.530 -79.986 23.3060 4.016e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
After removing PRONUN1 score from the model, now all predictors are significantly predicting human rated speaking score.
Call:
lm(formula = HRSPEAK ~ FLUENCY1 + PRONUN2 + SENTENCEMASTERY,
data = dat1.t2.std)
Residuals:
Min 1Q Median 3Q Max
-1.80287 -0.49203 -0.05194 0.43464 1.69636
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -7.468e-16 5.893e-02 0.000 1.00000
FLUENCY1 2.262e-01 6.739e-02 3.356 0.00105 **
PRONUN2 2.228e-01 8.702e-02 2.560 0.01169 *
SENTENCEMASTERY 4.384e-01 9.081e-02 4.828 4.02e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.6641 on 123 degrees of freedom
Multiple R-squared: 0.5694, Adjusted R-squared: 0.5589
F-statistic: 54.22 on 3 and 123 DF, p-value: < 2.2e-16
The final model had \(R^2\) equal to \(0.60\). The initial full model had \(R^2\) of $ 0.58$, meaning that only \(2\%\) of the variance was reduces after eliminating three variables from the model. Again, the most contributing score is SENTENCEMASTERY.
The result showed that the SENTENCEMASTERY score contributed the most not only to the human rated speaking score but also Emmersion speaking score. This suggests that using SENTENCEMASTERY score would supplement any detail lost after not using human rated speaking assessment.
Third task was on first, analyzing data with item response theory (IRT) models, second, identifying the best fitting model, and last, recommending \(25\) to \(30\) items that would best perform on a new assessment.
The item response data consists of 60 items and 151 examinees.
n = 60
for (i in 1:n){
names(dat2)[i+1] = paste("Item", i, sep = "")
print(names(dat2))
}First, I renamed items with Item 1 through Item 60.
[1] 139
About \(1\%\) of the total cases was missing. On average, about \(1\%\) of the examinees was missing across items and about \(1\%\) of the responses was missing across examinees. This was very small number of missing data. So I conducted listwise deletion, and total 12 rows were removed.
| vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Item1 | 1 | 139 | 2.54 | 0.74 | 3 | 2.69 | 0.00 | 0 | 3 | 3 | -1.55 | 1.66 | 0.06 |
| Item2 | 2 | 139 | 1.57 | 1.04 | 2 | 1.58 | 1.48 | 0 | 3 | 3 | -0.28 | -1.12 | 0.09 |
| Item3 | 3 | 139 | 2.08 | 0.89 | 2 | 2.17 | 1.48 | 0 | 3 | 3 | -0.64 | -0.48 | 0.08 |
| Item4 | 4 | 139 | 2.03 | 1.04 | 2 | 2.15 | 1.48 | 0 | 3 | 3 | -0.60 | -0.96 | 0.09 |
| Item5 | 5 | 139 | 0.93 | 0.72 | 1 | 0.88 | 0.00 | 0 | 3 | 3 | 0.45 | 0.03 | 0.06 |
| Item6 | 6 | 139 | 2.27 | 0.84 | 2 | 2.38 | 1.48 | 0 | 3 | 3 | -0.91 | -0.05 | 0.07 |
| Item7 | 7 | 139 | 1.81 | 1.19 | 2 | 1.88 | 1.48 | 0 | 3 | 3 | -0.31 | -1.49 | 0.10 |
| Item8 | 8 | 139 | 2.55 | 0.70 | 3 | 2.68 | 0.00 | 0 | 3 | 3 | -1.74 | 3.06 | 0.06 |
| Item9 | 9 | 139 | 1.68 | 1.12 | 2 | 1.73 | 1.48 | 0 | 3 | 3 | -0.07 | -1.43 | 0.09 |
| Item10 | 10 | 139 | 2.10 | 1.03 | 2 | 2.24 | 1.48 | 0 | 3 | 3 | -0.71 | -0.84 | 0.09 |
| Item11 | 11 | 139 | 2.10 | 0.94 | 2 | 2.19 | 1.48 | 0 | 3 | 3 | -0.56 | -0.93 | 0.08 |
| Item12 | 12 | 139 | 2.45 | 0.77 | 3 | 2.58 | 0.00 | 0 | 3 | 3 | -1.32 | 1.14 | 0.07 |
| Item13 | 13 | 139 | 2.05 | 0.92 | 2 | 2.17 | 1.48 | 0 | 3 | 3 | -0.77 | -0.22 | 0.08 |
| Item14 | 14 | 139 | 1.69 | 0.96 | 2 | 1.73 | 1.48 | 0 | 3 | 3 | -0.18 | -0.96 | 0.08 |
| Item15 | 15 | 139 | 2.32 | 0.79 | 2 | 2.42 | 1.48 | 0 | 3 | 3 | -0.97 | 0.30 | 0.07 |
| Item16 | 16 | 139 | 1.72 | 1.16 | 2 | 1.77 | 1.48 | 0 | 3 | 3 | -0.25 | -1.43 | 0.10 |
| Item17 | 17 | 139 | 2.21 | 1.01 | 3 | 2.35 | 0.00 | 0 | 3 | 3 | -0.84 | -0.72 | 0.09 |
| Item18 | 18 | 139 | 1.94 | 0.94 | 2 | 1.97 | 1.48 | 0 | 3 | 3 | -0.18 | -1.30 | 0.08 |
| Item19 | 19 | 139 | 2.60 | 0.68 | 3 | 2.72 | 0.00 | 0 | 3 | 3 | -1.94 | 4.11 | 0.06 |
| Item20 | 20 | 139 | 2.60 | 0.63 | 3 | 2.69 | 0.00 | 0 | 3 | 3 | -1.81 | 4.01 | 0.05 |
| Item21 | 21 | 139 | 2.36 | 0.86 | 3 | 2.50 | 0.00 | 0 | 3 | 3 | -1.16 | 0.40 | 0.07 |
| Item22 | 22 | 139 | 2.54 | 0.69 | 3 | 2.65 | 0.00 | 0 | 3 | 3 | -1.69 | 3.08 | 0.06 |
| Item23 | 23 | 139 | 1.32 | 1.17 | 1 | 1.28 | 1.48 | 0 | 3 | 3 | 0.34 | -1.38 | 0.10 |
| Item24 | 24 | 139 | 2.34 | 0.82 | 3 | 2.47 | 0.00 | 0 | 3 | 3 | -1.15 | 0.72 | 0.07 |
| Item25 | 25 | 139 | 0.94 | 0.61 | 1 | 0.92 | 0.00 | 0 | 3 | 3 | 0.22 | 0.33 | 0.05 |
| Item26 | 26 | 139 | 1.77 | 1.00 | 2 | 1.83 | 1.48 | 0 | 3 | 3 | -0.30 | -1.02 | 0.09 |
| Item27 | 27 | 139 | 2.47 | 0.75 | 3 | 2.61 | 0.00 | 0 | 3 | 3 | -1.50 | 2.02 | 0.06 |
| Item28 | 28 | 139 | 1.57 | 0.96 | 1 | 1.58 | 1.48 | 0 | 3 | 3 | 0.17 | -1.04 | 0.08 |
| Item29 | 29 | 139 | 1.42 | 0.95 | 1 | 1.41 | 1.48 | 0 | 3 | 3 | 0.29 | -0.86 | 0.08 |
| Item30 | 30 | 139 | 2.56 | 0.65 | 3 | 2.65 | 0.00 | 0 | 3 | 3 | -1.64 | 3.25 | 0.06 |
| Item31 | 31 | 139 | 1.52 | 1.04 | 1 | 1.52 | 1.48 | 0 | 3 | 3 | 0.09 | -1.19 | 0.09 |
| Item32 | 32 | 139 | 2.41 | 0.87 | 3 | 2.58 | 0.00 | 0 | 3 | 3 | -1.42 | 1.14 | 0.07 |
| Item33 | 33 | 139 | 1.40 | 0.98 | 1 | 1.38 | 1.48 | 0 | 3 | 3 | 0.20 | -0.97 | 0.08 |
| Item34 | 34 | 139 | 2.42 | 0.89 | 3 | 2.59 | 0.00 | 0 | 3 | 3 | -1.41 | 0.91 | 0.08 |
| Item35 | 35 | 139 | 0.76 | 0.79 | 1 | 0.68 | 1.48 | 0 | 3 | 3 | 0.63 | -0.60 | 0.07 |
| Item36 | 36 | 139 | 2.46 | 0.75 | 3 | 2.59 | 0.00 | 0 | 3 | 3 | -1.58 | 2.49 | 0.06 |
| Item37 | 37 | 139 | 1.36 | 1.08 | 1 | 1.33 | 1.48 | 0 | 3 | 3 | 0.22 | -1.23 | 0.09 |
| Item38 | 38 | 139 | 0.94 | 0.94 | 1 | 0.82 | 1.48 | 0 | 3 | 3 | 0.74 | -0.37 | 0.08 |
| Item39 | 39 | 139 | 1.54 | 1.09 | 1 | 1.55 | 1.48 | 0 | 3 | 3 | 0.15 | -1.34 | 0.09 |
| Item40 | 40 | 139 | 1.14 | 1.07 | 1 | 1.06 | 1.48 | 0 | 3 | 3 | 0.55 | -0.97 | 0.09 |
| Item41 | 41 | 139 | 1.35 | 0.99 | 1 | 1.31 | 1.48 | 0 | 3 | 3 | 0.38 | -0.90 | 0.08 |
| Item42 | 42 | 139 | 1.35 | 0.92 | 1 | 1.32 | 0.00 | 0 | 3 | 3 | 0.46 | -0.65 | 0.08 |
| Item43 | 43 | 139 | 1.51 | 1.12 | 1 | 1.51 | 1.48 | 0 | 3 | 3 | 0.08 | -1.38 | 0.09 |
| Item44 | 44 | 139 | 1.35 | 1.23 | 1 | 1.31 | 1.48 | 0 | 3 | 3 | 0.25 | -1.55 | 0.10 |
| Item45 | 45 | 139 | 1.05 | 1.07 | 1 | 0.95 | 1.48 | 0 | 3 | 3 | 0.69 | -0.79 | 0.09 |
| Item46 | 46 | 139 | 2.78 | 0.60 | 3 | 2.93 | 0.00 | 0 | 3 | 3 | -3.30 | 11.34 | 0.05 |
| Item47 | 47 | 139 | 1.12 | 1.00 | 1 | 1.04 | 1.48 | 0 | 3 | 3 | 0.54 | -0.76 | 0.08 |
| Item48 | 48 | 139 | 0.97 | 0.95 | 1 | 0.87 | 1.48 | 0 | 3 | 3 | 0.61 | -0.65 | 0.08 |
| Item49 | 49 | 139 | 0.88 | 0.93 | 1 | 0.78 | 1.48 | 0 | 3 | 3 | 0.67 | -0.65 | 0.08 |
| Item50 | 50 | 139 | 1.12 | 0.83 | 1 | 1.10 | 1.48 | 0 | 3 | 3 | 0.17 | -0.80 | 0.07 |
| Item51 | 51 | 139 | 1.33 | 0.97 | 1 | 1.29 | 1.48 | 0 | 3 | 3 | 0.34 | -0.87 | 0.08 |
| Item52 | 52 | 139 | 0.92 | 0.70 | 1 | 0.87 | 0.00 | 0 | 3 | 3 | 0.61 | 0.65 | 0.06 |
| Item53 | 53 | 139 | 1.14 | 0.80 | 1 | 1.12 | 1.48 | 0 | 3 | 3 | 0.26 | -0.49 | 0.07 |
| Item54 | 54 | 139 | 0.99 | 0.88 | 1 | 0.91 | 1.48 | 0 | 3 | 3 | 0.54 | -0.50 | 0.07 |
| Item55 | 55 | 139 | 0.85 | 0.77 | 1 | 0.77 | 1.48 | 0 | 3 | 3 | 0.73 | 0.31 | 0.07 |
| Item56 | 56 | 139 | 1.19 | 1.07 | 1 | 1.12 | 1.48 | 0 | 3 | 3 | 0.44 | -1.06 | 0.09 |
| Item57 | 57 | 139 | 1.93 | 0.93 | 2 | 2.01 | 1.48 | 0 | 3 | 3 | -0.45 | -0.75 | 0.08 |
| Item58 | 58 | 139 | 1.11 | 0.81 | 1 | 1.04 | 0.00 | 0 | 3 | 3 | 0.60 | 0.07 | 0.07 |
| Item59 | 59 | 139 | 1.76 | 0.99 | 2 | 1.81 | 1.48 | 0 | 3 | 3 | -0.03 | -1.25 | 0.08 |
| Item60 | 60 | 139 | 1.16 | 1.01 | 1 | 1.08 | 1.48 | 0 | 3 | 3 | 0.57 | -0.75 | 0.09 |
[1] 1.705156
[1] 0.1548018
The minimum and maximum values of the item response are \(0\) and \(3\), respectively. This indicates there are four response categories for these items. The average response score was \(1.7\) and standard deviation was around \(0.2\)
When the multiple categories types of item response data is used, polytomous item response theory models need be applied. In order to identify the best fitting model, I applied five item response theory models that include include partial credit model, generalized partial credit model, rating scale model, graded response model, and nominal response model.
First I applied partial credit model. This model was developed to analyze test items that require multiple steps which allows to assign partial credit for completing several steps. This model is from a a family of the Rasch model meaning that it’s not estimating item discrimination parameters but only item category parameters.
Call:
mirt(data = dat2.item, model = 1, itemtype = "Rasch")
Full-information item factor analysis with 1 factor(s).
Converged within 1e-04 tolerance after 129 EM iterations.
mirt version: 1.33.2
M-step optimizer: nlminb
EM acceleration: Ramsay
Number of rectangular quadrature: 61
Latent density type: Gaussian
Log-likelihood = -6077.361
Estimated parameters: 181
AIC = 12516.72; AICc = 10984.54
BIC = 13047.86; SABIC = 12475.22
G2 (1e+10) = 10785.71, p = 1
RMSEA = 0, CFI = NaN, TLI = NaN
M2 df p RMSEA RMSEA_5 RMSEA_95 SRMSR TLI
stats 3038.552 1650 0 0.07809073 0.07346065 0.08213571 0.08425738 0.967646
CFI
stats 0.967646
Generalized partial credit model is an extended partial credit model that further estimates item discrimination parameter in addition to the item category parameters.
Call:
mirt(data = dat2.item, model = 1, itemtype = "gpcm")
Full-information item factor analysis with 1 factor(s).
FAILED TO CONVERGE within 1e-04 tolerance after 500 EM iterations.
mirt version: 1.33.2
M-step optimizer: BFGS
EM acceleration: Ramsay
Number of rectangular quadrature: 61
Latent density type: Gaussian
Log-likelihood = -5955.801
Estimated parameters: 240
AIC = 12391.6; AICc = 11257.49
BIC = 13095.88; SABIC = 12336.57
G2 (1e+10) = 10542.59, p = 1
RMSEA = 0, CFI = NaN, TLI = NaN
M2 df p RMSEA RMSEA_5 RMSEA_95 SRMSR TLI
stats 3259.788 1590 0 0.08723535 0.08266685 0.09116961 0.05963919 0.9596249
CFI
stats 0.9610931
Rating scale model is a constrained version of the partial credit model. This model was developed to measure rating scales such as Likert scales, which were assumed to function in the same way across all items in a test.
Call:
mirt(data = dat2.item, model = 1, itemtype = "rsm")
Full-information item factor analysis with 1 factor(s).
Converged within 1e-04 tolerance after 116 EM iterations.
mirt version: 1.33.2
M-step optimizer: nlminb
EM acceleration: Ramsay
Number of rectangular quadrature: 61
Latent density type: Gaussian
Log-likelihood = -6503.316
Estimated parameters: 240
AIC = 13132.63; AICc = 13240.15
BIC = 13317.5; SABIC = 13118.19
G2 (1e+10) = 11637.62, p = 1
RMSEA = 0, CFI = NaN, TLI = NaN
M2 df p RMSEA RMSEA_5 RMSEA_95 TLI CFI
stats 3884.297 1768 0 0.09313387 0.08884813 0.09674998 0.9539803 0.9506892
Graded response model is appropriate when item responses are ordered categorical responses. This model is a generalization of the 2PL model so it estimates item discrimination parameters and category parameters.
Call:
mirt(data = dat2.item, model = 1, itemtype = "graded", SE = TRUE)
Full-information item factor analysis with 1 factor(s).
Converged within 1e-04 tolerance after 328 EM iterations.
mirt version: 1.33.2
M-step optimizer: BFGS
EM acceleration: Ramsay
Number of rectangular quadrature: 61
Latent density type: Gaussian
Information matrix estimated with method: Oakes
Second-order test: model is a possible local maximum
Condition number of information matrix = 361.1029
Log-likelihood = -5862.487
Estimated parameters: 240
AIC = 12204.97; AICc = 11070.86
BIC = 12909.25; SABIC = 12149.94
G2 (1e+10) = 10355.96, p = 1
RMSEA = 0, CFI = NaN, TLI = NaN
M2 df p RMSEA RMSEA_5 RMSEA_95 SRMSR TLI
stats 3492.428 1590 0 0.09311421 0.08861305 0.09694605 0.05807705 0.9539929
CFI
stats 0.9556659
Nominal response model can be used to item responses that are not strictly ordered but also can be used to ordered responses. This model is considered the most general polytomous item response theory model.
Call:
mirt(data = dat2.item, model = 1, itemtype = "nominal")
Full-information item factor analysis with 1 factor(s).
Converged within 1e-04 tolerance after 100 EM iterations.
mirt version: 1.33.2
M-step optimizer: BFGS
EM acceleration: Ramsay
Number of rectangular quadrature: 61
Latent density type: Gaussian
Log-likelihood = -5749.68
Estimated parameters: 360
AIC = 12219.36; AICc = 11048.55
BIC = 13275.77; SABIC = 12136.81
G2 (1e+10) = 10130.35, p = 1
RMSEA = 0, CFI = NaN, TLI = NaN
M2 df p RMSEA RMSEA_5 RMSEA_95 TLI CFI
stats 3291.605 1470 0 0.09476084 0.09011006 0.09873236 0.9523563 0.9575538
Model 1: mirt(data = dat2.item, model = 1, itemtype = "rsm")
Model 2: mirt(data = dat2.item, model = 1, itemtype = "Rasch")
AIC AICc SABIC HQ BIC logLik X2 df p
1 13132.63 13240.15 13118.19 13207.76 13317.50 -6503.316 NaN NaN NaN
2 12516.72 10984.54 12475.22 12732.56 13047.86 -6077.361 851.908 118 0
Model 1: mirt(data = dat2.item, model = 1, itemtype = "Rasch")
Model 2: mirt(data = dat2.item, model = 1, itemtype = "gpcm")
AIC AICc SABIC HQ BIC logLik X2 df p
1 12516.72 10984.54 12475.22 12732.56 13047.86 -6077.361 NaN NaN NaN
2 12391.60 11257.49 12336.57 12677.80 13095.88 -5955.801 243.12 59 0
For nested models, goodness-of-fit statistics such as difference \(\chi^2\) can be used to select better fitting models.
| Model | LogLikelihood | AIC | BIC | RMSEA | CFI | TLI |
|---|---|---|---|---|---|---|
| PCM | -6077.36 | 12516.7 | 13047.9 | 0.078 | 0.97 | 0.97 |
| GPCM | -5955.80 | 12391.6 | 13095.9 | 0.087 | 0.96 | 0.96 |
| RSM | -6503.32 | 13132.6 | 13317.5 | 0.093 | 0.95 | 0.95 |
| GRM | -5862.49 | 12205.0 | 12909.3 | 0.093 | 0.96 | 0.95 |
| NRM | -5749.68 | 12219.4 | 13275.8 | 0.095 | 0.96 | 0.95 |
For non-nested model comparison, fit indices such as the log-likelihood, Akaike’s information criteria (AIC), and Bayesian information criteria can be used. The lower values of these three indices indicate a better comparative fit.
Other indices such as root mean squared error approximation (RMSEA), Tucker-Lewis Index (TLI), and comparative fit index (CFI) are also helpful in evaluating the model fit and judging the best fit model.
Values of the CFI and TLI higher than 0.9 are indicative of an acceptable fit, with values higher than 0.95 suggesting an excellent fit; values of the RMSEA less than 0.05 indicate close fit of a model to data and values between 0.05 and 0.08 reflect reasonable fit of a model.
After evaluating these fit indices of each model, I concluded that the graded response model is the best fitting model to the given response data.
To recommend about 25 to 30 items, I fitted graded response model to obtain item parameter estimates, standardized factor loadings, category response curves, and item information curves.
Item parameter estimates and standardized factor loadings for all 60 items were extracted and saved in a separate csv file.
| item | Zh | outfit | z.outfit | infit | z.infit | X2 | df.X2 | RMSEA.X2 | p.X2 | G2 | df.G2 | RMSEA.G2 | p.G2 | S_X2 | df.S_X2 | RMSEA.S_X2 | p.S_X2 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Item1 | 0.21 | 0.59 | -0.44 | 0.97 | -0.09 | 2.45 | 2 | 0.04 | 0.29 | 1.72 | 2 | 0.00 | 0.42 | 13.08 | 14 | 0.00 | 0.52 |
| Item2 | 0.31 | 0.89 | -0.48 | 0.97 | -0.15 | 8.04 | 5 | 0.07 | 0.15 | 6.35 | 4 | 0.07 | 0.17 | 19.81 | 19 | 0.02 | 0.41 |
| Item3 | 0.13 | 2.94 | 2.89 | 0.95 | -0.28 | 1.79 | 2 | 0.00 | 0.41 | 0.82 | 2 | 0.00 | 0.66 | 12.31 | 15 | 0.00 | 0.66 |
| Item4 | -0.16 | 1.78 | 1.18 | 1.02 | 0.15 | 16.36 | 5 | 0.13 | 0.01 | 19.73 | 5 | 0.15 | 0.00 | 23.93 | 19 | 0.04 | 0.20 |
| Item5 | 0.03 | 1.03 | 0.24 | 1.04 | 0.36 | 5.02 | 5 | 0.01 | 0.41 | 3.76 | 5 | 0.00 | 0.58 | 29.46 | 25 | 0.04 | 0.25 |
| Item6 | 0.02 | 1.01 | 0.21 | 0.97 | -0.17 | 4.76 | 5 | 0.00 | 0.45 | 8.60 | 5 | 0.07 | 0.13 | 21.99 | 18 | 0.04 | 0.23 |
| Item7 | 0.16 | 0.76 | -0.19 | 0.99 | 0.01 | 8.71 | 3 | 0.12 | 0.03 | 12.30 | 2 | 0.19 | 0.00 | 14.03 | 15 | 0.00 | 0.52 |
| Item8 | 0.13 | 0.77 | -0.43 | 0.92 | -0.33 | 7.33 | 4 | 0.08 | 0.12 | 6.40 | 4 | 0.07 | 0.17 | 22.30 | 19 | 0.04 | 0.27 |
| Item9 | 0.16 | 1.36 | 0.81 | 0.93 | -0.45 | 5.14 | 3 | 0.07 | 0.16 | 7.53 | 3 | 0.10 | 0.06 | 24.47 | 19 | 0.05 | 0.18 |
| Item10 | 0.32 | 0.65 | -0.50 | 0.98 | -0.09 | 2.00 | 2 | 0.00 | 0.37 | 2.04 | 2 | 0.01 | 0.36 | 14.14 | 17 | 0.00 | 0.66 |
| Item11 | 0.35 | 0.65 | -0.67 | 0.96 | -0.23 | 3.08 | 3 | 0.01 | 0.38 | 2.78 | 3 | 0.00 | 0.43 | 15.71 | 16 | 0.00 | 0.47 |
| Item12 | 0.06 | 1.03 | 0.18 | 0.92 | -0.47 | 12.15 | 5 | 0.10 | 0.03 | 17.52 | 5 | 0.13 | 0.00 | 28.91 | 23 | 0.04 | 0.18 |
| Item13 | -0.11 | 1.01 | 0.16 | 1.04 | 0.33 | 13.50 | 4 | 0.13 | 0.01 | 12.13 | 4 | 0.12 | 0.02 | 20.47 | 18 | 0.03 | 0.31 |
| Item14 | -0.01 | 1.00 | 0.02 | 0.95 | -0.37 | 27.12 | 9 | 0.12 | 0.00 | 28.27 | 9 | 0.12 | 0.00 | 50.19 | 25 | 0.09 | 0.00 |
| Item15 | 0.06 | 0.88 | -0.03 | 1.02 | 0.17 | 2.40 | 4 | 0.00 | 0.66 | 5.30 | 4 | 0.05 | 0.26 | 10.07 | 16 | 0.00 | 0.86 |
| Item16 | 0.23 | 0.88 | -0.02 | 0.92 | -0.48 | 2.34 | 4 | 0.00 | 0.67 | 4.73 | 4 | 0.04 | 0.32 | 18.36 | 18 | 0.01 | 0.43 |
| Item17 | 0.41 | 0.61 | -0.78 | 0.93 | -0.36 | 5.73 | 2 | 0.12 | 0.06 | 5.90 | 2 | 0.12 | 0.05 | 17.06 | 13 | 0.05 | 0.20 |
| Item18 | 0.18 | 0.78 | -0.31 | 0.97 | -0.16 | 1.54 | 4 | 0.00 | 0.82 | 3.26 | 4 | 0.00 | 0.52 | 14.33 | 16 | 0.00 | 0.57 |
| Item19 | 0.06 | 0.75 | -0.15 | 0.99 | 0.01 | 4.15 | 3 | 0.05 | 0.25 | 3.85 | 3 | 0.05 | 0.28 | 15.12 | 16 | 0.00 | 0.52 |
| Item20 | 0.24 | 0.56 | -0.56 | 0.98 | -0.04 | 7.77 | 1 | 0.22 | 0.01 | 9.04 | 1 | 0.24 | 0.00 | 25.92 | 15 | 0.07 | 0.04 |
| Item21 | -0.07 | 16.72 | 10.03 | 0.97 | -0.13 | 4.17 | 6 | 0.00 | 0.65 | 6.25 | 6 | 0.02 | 0.40 | 12.30 | 19 | 0.00 | 0.87 |
| Item22 | -0.07 | 1.47 | 0.97 | 1.10 | 0.54 | 6.04 | 3 | 0.09 | 0.11 | 8.46 | 3 | 0.11 | 0.04 | 15.44 | 18 | 0.00 | 0.63 |
| Item23 | 0.19 | 0.83 | -0.37 | 0.99 | 0.01 | 8.82 | 5 | 0.07 | 0.12 | 8.38 | 5 | 0.07 | 0.14 | 23.79 | 21 | 0.03 | 0.30 |
| Item24 | -0.06 | 1.60 | 1.06 | 0.92 | -0.47 | 4.14 | 3 | 0.05 | 0.25 | 7.68 | 3 | 0.11 | 0.05 | 19.74 | 17 | 0.03 | 0.29 |
| Item25 | 0.15 | 0.93 | -0.39 | 0.97 | -0.19 | 9.27 | 4 | 0.10 | 0.05 | 10.86 | 4 | 0.11 | 0.03 | 16.53 | 19 | 0.00 | 0.62 |
| Item26 | -0.09 | 1.08 | 0.50 | 0.96 | -0.30 | 14.75 | 7 | 0.09 | 0.04 | 21.04 | 7 | 0.12 | 0.00 | 38.48 | 26 | 0.06 | 0.05 |
| Item27 | 0.11 | 0.75 | -0.37 | 1.00 | 0.08 | 6.61 | 4 | 0.07 | 0.16 | 9.59 | 4 | 0.10 | 0.05 | 31.37 | 18 | 0.07 | 0.03 |
| Item28 | 0.05 | 1.02 | 0.16 | 1.07 | 0.51 | 4.75 | 3 | 0.06 | 0.19 | -1.44 | 3 | 0.00 | 1.00 | 18.04 | 21 | 0.00 | 0.65 |
| Item29 | 0.07 | 1.07 | 0.45 | 1.09 | 0.66 | 4.40 | 5 | 0.00 | 0.49 | 0.29 | 5 | 0.00 | 1.00 | 22.98 | 22 | 0.02 | 0.40 |
| Item30 | 0.10 | 0.69 | -0.31 | 1.01 | 0.13 | 2.78 | 2 | 0.05 | 0.25 | 2.34 | 2 | 0.04 | 0.31 | 18.76 | 16 | 0.04 | 0.28 |
| Item31 | -0.10 | 1.17 | 0.78 | 1.08 | 0.62 | 11.41 | 6 | 0.08 | 0.08 | 8.70 | 6 | 0.06 | 0.19 | 22.78 | 24 | 0.00 | 0.53 |
| Item32 | 0.24 | 0.63 | -0.33 | 0.96 | -0.15 | 4.44 | 3 | 0.06 | 0.22 | 6.73 | 3 | 0.09 | 0.08 | 9.92 | 14 | 0.00 | 0.77 |
| Item33 | 0.18 | 0.93 | -0.32 | 0.96 | -0.25 | 18.18 | 7 | 0.11 | 0.01 | 17.84 | 7 | 0.11 | 0.01 | 35.96 | 23 | 0.06 | 0.04 |
| Item34 | 0.16 | 0.64 | -0.19 | 0.99 | 0.02 | 13.31 | 5 | 0.11 | 0.02 | 15.83 | 5 | 0.13 | 0.01 | 24.20 | 15 | 0.07 | 0.06 |
| Item35 | -0.01 | 1.05 | 0.30 | 1.06 | 0.51 | 4.75 | 4 | 0.04 | 0.31 | 2.23 | 4 | 0.00 | 0.69 | 25.69 | 20 | 0.05 | 0.18 |
| Item36 | 0.05 | 0.82 | -0.08 | 1.00 | 0.07 | 0.62 | 2 | 0.00 | 0.73 | 1.35 | 2 | 0.00 | 0.51 | 13.35 | 13 | 0.01 | 0.42 |
| Item37 | 0.27 | 0.88 | -0.48 | 0.97 | -0.21 | 3.44 | 4 | 0.00 | 0.49 | 2.06 | 4 | 0.00 | 0.72 | 21.76 | 22 | 0.00 | 0.47 |
| Item38 | 0.14 | 0.98 | -0.03 | 0.99 | -0.04 | 9.62 | 6 | 0.07 | 0.14 | 7.78 | 6 | 0.05 | 0.25 | 25.39 | 24 | 0.02 | 0.38 |
| Item39 | 0.03 | 1.04 | 0.24 | 0.96 | -0.19 | 11.12 | 5 | 0.09 | 0.05 | 12.98 | 5 | 0.11 | 0.02 | 30.79 | 21 | 0.06 | 0.08 |
| Item40 | 0.05 | 1.12 | 0.63 | 1.07 | 0.53 | 7.07 | 5 | 0.05 | 0.22 | 10.05 | 5 | 0.09 | 0.07 | 36.61 | 23 | 0.07 | 0.04 |
| Item41 | 0.15 | 1.03 | 0.19 | 1.05 | 0.41 | 2.41 | 2 | 0.04 | 0.30 | 3.55 | 2 | 0.07 | 0.17 | 19.04 | 19 | 0.00 | 0.45 |
| Item42 | 0.21 | 0.92 | -0.33 | 0.96 | -0.27 | 9.49 | 4 | 0.10 | 0.05 | 7.77 | 3 | 0.11 | 0.05 | 12.59 | 21 | 0.00 | 0.92 |
| Item43 | 0.03 | 1.17 | 0.58 | 0.93 | -0.47 | 4.98 | 3 | 0.07 | 0.17 | 6.88 | 3 | 0.10 | 0.08 | 39.71 | 19 | 0.09 | 0.00 |
| Item44 | 0.20 | 0.83 | -0.18 | 0.95 | -0.23 | 2.62 | 3 | 0.00 | 0.45 | 2.17 | 3 | 0.00 | 0.54 | 13.49 | 16 | 0.00 | 0.64 |
| Item45 | 0.23 | 0.81 | -0.77 | 0.90 | -0.66 | 29.11 | 6 | 0.17 | 0.00 | 25.78 | 4 | 0.20 | 0.00 | 41.63 | 20 | 0.09 | 0.00 |
| Item46 | 0.09 | 0.48 | -0.30 | 0.91 | -0.20 | 3.17 | 0 | NaN | NaN | 2.21 | 0 | NaN | NaN | 9.75 | 6 | 0.07 | 0.14 |
| Item47 | 0.27 | 0.87 | -0.61 | 0.94 | -0.39 | 4.49 | 4 | 0.03 | 0.34 | 7.16 | 4 | 0.08 | 0.13 | 26.33 | 21 | 0.04 | 0.19 |
| Item48 | -0.07 | 1.26 | 1.30 | 1.16 | 1.19 | 7.28 | 5 | 0.06 | 0.20 | 6.81 | 5 | 0.05 | 0.24 | 25.21 | 23 | 0.03 | 0.34 |
| Item49 | 0.07 | 0.99 | 0.04 | 1.04 | 0.34 | 3.68 | 4 | 0.00 | 0.45 | 4.31 | 4 | 0.02 | 0.37 | 24.37 | 21 | 0.03 | 0.28 |
| Item50 | 0.05 | 1.03 | 0.21 | 1.10 | 0.78 | 3.61 | 4 | 0.00 | 0.46 | -0.98 | 4 | 0.00 | 1.00 | 26.66 | 22 | 0.04 | 0.22 |
| Item51 | 0.21 | 0.90 | -0.54 | 0.99 | -0.07 | 3.23 | 5 | 0.00 | 0.66 | 4.22 | 5 | 0.00 | 0.52 | 23.17 | 20 | 0.03 | 0.28 |
| Item52 | 0.07 | 1.01 | 0.13 | 0.98 | -0.10 | 7.36 | 4 | 0.08 | 0.12 | 2.59 | 3 | 0.00 | 0.46 | 28.43 | 23 | 0.04 | 0.20 |
| Item53 | 0.25 | 0.84 | -0.67 | 1.03 | 0.25 | 7.01 | 2 | 0.13 | 0.03 | 6.25 | 2 | 0.12 | 0.04 | 18.47 | 15 | 0.04 | 0.24 |
| Item54 | 0.13 | 0.96 | -0.19 | 1.01 | 0.11 | 10.85 | 4 | 0.11 | 0.03 | 3.96 | 2 | 0.08 | 0.14 | 24.66 | 22 | 0.03 | 0.31 |
| Item55 | 0.03 | 1.04 | 0.26 | 1.01 | 0.14 | 7.59 | 6 | 0.04 | 0.27 | 6.17 | 6 | 0.01 | 0.40 | 35.51 | 26 | 0.05 | 0.10 |
| Item56 | 0.29 | 0.86 | -0.41 | 1.04 | 0.35 | 3.33 | 3 | 0.03 | 0.34 | 2.45 | 3 | 0.00 | 0.49 | 25.17 | 20 | 0.04 | 0.20 |
| Item57 | 0.02 | 1.11 | 0.50 | 0.97 | -0.18 | 4.17 | 4 | 0.02 | 0.38 | 4.96 | 4 | 0.04 | 0.29 | 19.72 | 22 | 0.00 | 0.60 |
| Item58 | 0.08 | 1.04 | 0.30 | 1.09 | 0.71 | 6.42 | 3 | 0.09 | 0.09 | 3.34 | 3 | 0.03 | 0.34 | 21.60 | 19 | 0.03 | 0.30 |
| Item59 | 0.07 | 0.93 | -0.14 | 0.96 | -0.24 | 2.97 | 3 | 0.00 | 0.40 | 3.02 | 3 | 0.01 | 0.39 | 34.29 | 19 | 0.08 | 0.02 |
| Item60 | -0.01 | 1.32 | 1.66 | 0.97 | -0.19 | 10.95 | 4 | 0.11 | 0.03 | 15.52 | 4 | 0.14 | 0.00 | 23.56 | 21 | 0.03 | 0.31 |
V1 theta
255 3.570555 -1.46
V1 theta
319 3.701036 -0.82
V1 theta
301 5.561037 -1
V1 theta
318 3.570588 -0.83
V1 theta
526 1.004998 1.25
V1 theta
278 2.939597 -1.23
V1 theta
366 7.947952 -0.35
V1 theta
150 0.8680328 -2.51
V1 theta
392 4.012261 -0.09
V1 theta
317 6.87392 -0.84
V1 theta
315 7.456777 -0.86
V1 theta
170 0.6458577 -2.31
V1 theta
260 2.611754 -1.41
V1 theta
343 1.476888 -0.58
V1 theta
256 2.818058 -1.45
V1 theta
351 4.911946 -0.5
V1 theta
318 9.975786 -0.83
V1 theta
347 3.890309 -0.54
V1 theta
168 1.432936 -2.33
V1 theta
182 2.758194 -2.19
V1 theta
255 1.349933 -1.46
V1 theta
158 1.182987 -2.43
V1 theta
438 3.829325 0.37
V1 theta
232 2.632179 -1.69
V1 theta
533 1.119239 1.32
V1 theta
327 1.284963 -0.74
V1 theta
190 1.378351 -2.11
V1 theta
398 2.550218 -0.03
V1 theta
415 2.85268 0.14
V1 theta
170 2.084035 -2.31
V1 theta
392 2.056891 -0.09
V1 theta
249 4.353934 -1.52
V1 theta
401 2.23859 0
V1 theta
258 3.233172 -1.43
V1 theta
506 1.891507 1.05
V1 theta
215 3.293347 -1.86
V1 theta
402 3.291923 0.01
V1 theta
507 1.719203 1.06
V1 theta
423 2.492895 0.22
V1 theta
471 2.756464 0.7
V1 theta
428 4.29169 0.27
V1 theta
442 1.996986 0.41
V1 theta
383 4.229973 -0.18
V1 theta
416 4.948164 0.15
V1 theta
482 3.182825 0.81
V1 theta
202 3.934302 -1.99
V1 theta
446 3.756569 0.45
V1 theta
469 2.021608 0.68
V1 theta
465 2.243699 0.64
V1 theta
439 2.011606 0.38
V1 theta
418 3.377894 0.17
V1 theta
541 1.556218 1.4
V1 theta
440 5.567725 0.39
V1 theta
463 2.516734 0.62
V1 theta
556 0.9799728 1.55
V1 theta
427 4.824898 0.26
V1 theta
309 1.98861 -0.92
V1 theta
487 2.431397 0.86
V1 theta
379 2.383984 -0.22
V1 theta
465 2.915979 0.64